Biostatistics For Dummies (Monika Wahi John Pezzullo)

Your software may offer one or more of the following goodness-of-fit measures:

A measure of agreement between the observed and predicted outcomes called concordance (see

the bottom of Figure 23-4). Concordance indicates the extent to which participants with higher

predicted hazard values had shorter observed survival times, which is what you’d expect. Figure

23-4 shows a concordance of 0.642 for this regression.

An r (or r²) value that’s interpreted like a correlation coefficient in ordinary regression, meaning

the larger the r² value, the better the model fits the data. In Figure 23-4, r² (labeled Rsquare) is

0.116.

A likelihood ratio test and associated p value that compares the full model, which includes all the

parameters, to a model consisting of just the overall baseline function. In Figure 23-4, the

likelihood ratio p value is shown as

, which is scientific notation for

indicating a model that includes the CenterCD and Radiation variables can predict survival

statistically significantly better than just the overall (baseline) survival curve.

Akaike’s Information Criterion (AIC) is especially useful for comparing alternative models but is

not included in Figure 23-4.

Focusing on baseline survival and hazard functions

The baseline survival function is represented as a table with two columns — time and predicted

survival — and a row for each distinct time at which one or more events were observed.

The baseline survival function’s table may have hundreds of rows for large data sets, so

instead of printing it, you should save the table as a data file. Then, you can use it to generate a

customized prognosis curve (described in the next section) for any specific set of values for the

predictor variables.

The software may also offer a graph of the baseline survival function. If your software is using an

average-participant baseline (see the earlier section, “The steps to perform a PH regression”), this

graph is useful as an indicator of the entire group’s overall survival. But if your software uses a zero-

participant baseline, the curve is not helpful.

How Long Have I Got, Doc? Constructing

Prognosis Curves

A primary reason to use regression analysis is to predict outcomes from any particular set of predictor

values. For survival analysis, you can use the regression coefficients from a PH regression along with

the baseline survival curve to construct an expected survival (prognosis) curve for any set of predictor

values.

Suppose that you’re survival time (from diagnosis to death) for a group of cancer patients in which the

predictors are age, tumor stage, and tumor grade at the time of diagnosis. You’d run a PH regression on

your data and have the program generate the baseline survival curve as a table of times and survival